DySC: software for greedy clustering of 16S rRNA reads

نویسندگان

  • Zejun Zheng
  • Stefan Kramer
  • Bertil Schmidt
چکیده

UNLABELLED Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. AVAILABILITY AND IMPLEMENTATION DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment

High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial dive...

متن کامل

Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering

BACKGROUND High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recen...

متن کامل

From sequencing reads to microbial Diversity: bioinformatic Algorithms for Processing amplicon sequencing Data

has revolutionized the field of microbial ecology by offering a cost-efficient method to assess microbial diversity at an unseen depth using 16S rRNA amplicon sequencing approaches. Different preprocessing algorithms need to be performed to obtain a collection of highly reliable sequencing reads, ending with a clustering step to group them into Operational Taxonomic Units (OTUs) However, this a...

متن کامل

Distribution-based clustering: using ecology to refine the operational taxonomic unit.

16S rRNA sequencing, commonly used to survey microbial communities, begins by grouping individual reads into operational taxonomic units (OTUs). There are two major challenges in calling OTUs: identifying bacterial population boundaries and differentiating true diversity from sequencing errors. Current approaches to identifying taxonomic groups or eliminating sequencing errors rely on sequence ...

متن کامل

A heritability-based comparison of methods used to cluster 16S rRNA gene sequences into operational taxonomic units

A variety of methods are available to collapse 16S rRNA gene sequencing reads to the operational taxonomic units (OTUs) used in microbiome analyses. A number of studies have aimed to compare the quality of the resulting OTUs. However, in the absence of a standard method to define and enumerate the different taxa within a microbial community, existing comparisons have been unable to compare the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 28 16  شماره 

صفحات  -

تاریخ انتشار 2012